skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Yang, Wooseong"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This article introduces ConfliBERT-Spanish, a pre-trained language model specialized in political conflict and violence for text written in the Spanish language. Our methodology relies on a large corpus specialized in politics and violence to extend the capacity of pre-trained models capable of processing text in Spanish. We assess the performance of ConfliBERT-Spanish in comparison to Multilingual BERT and BETO baselines for binary classification, multi-label classification, and named entity recognition. Results show that ConfliBERT-Spanish consistently outperforms baseline models across all tasks. These results show that our domain-specific language-specific cyberinfrastructure can greatly enhance the performance of NLP models for Latin American conflict analysis. This methodological advancement opens vast opportunities to help researchers and practitioners in the security sector to effectively analyze large amounts of information with high degrees of accuracy, thus better equipping them to meet the dynamic and complex security challenges affecting the region. 
    more » « less
  2. Mitkov, Ruslan; Angelova, Galia (Ed.)
    This study investigates the use of Natural Language Processing (NLP) methods to analyze politics, conflicts and violence in the Middle East using domain-specific pre-trained language models. We introduce Arabic text and present ConfliBERT-Arabic, a pre-trained language models that can efficiently analyze political, conflict and violence-related texts. Our technique hones a pre-trained model using a corpus of Arabic texts about regional politics and conflicts. Performance of our models is compared to baseline BERT models. Our findings show that the performance of NLP models for Middle Eastern politics and conflict analysis are enhanced by the use of domain-specific pre-trained local language models. This study offers political and conflict analysts, including policymakers, scholars, and practitioners new approaches and tools for deciphering the intricate dynamics of local politics and conflicts directly in Arabic. 
    more » « less